cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
نویسندگان
چکیده
Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 high overhead on its generation step, which consists several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), multi-scalar (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation with the following three techniques to achieve performance. First, we propose new parallel MSM algorithm. This algorithm achieves nearly perfect linear speedup over Pippenger algorithm, well-known serial Second, parallelize MUL operation. Along our self-designed scheme well-studied NTT scheme, cuZK parallelization all operations step. Third, reduces latency caused by CPU-GPU data transfer 1) reducing redundant 2) overlapping device computation. The evaluation results show that module provides 2.08x (up 2.94x) versus state-of-the-art implementation. 2.65x 4.86x) standard benchmarks 2.18× GPU-accelerated cryptocurrency application, Filecoin.
منابع مشابه
Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs
Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned ...
متن کاملFaster Implementation of Scalar Multiplication on Koblitz Curves
We design a state-of-the-art software implementation of field and elliptic curve arithmetic in standard Koblitz curves at the 128-bit security level. Field arithmetic is carefully crafted by using the best formulae and implementation strategies available, and the increasingly common native support to binary field arithmetic in modern desktop computing platforms. The i-th power of the Frobenius ...
متن کاملA Faster Parallel Algorithm for Matrix Multiplication on a Mesh Array
Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. Cannon’s distributed algorithm to multiply two n-by-n matrices on a two dimensional square mesh array with n cells takes exactly 3n−2 communication steps to complete. We show that it is possible to perform matrix multiplication in just 1.5n − 1 communication steps on a two...
متن کاملAccelerating Radiosity on GPUs
We propose a novel approach to implement radiosity on GPU with specific optimizations via form-factor matrix transformations. The proposed transformations enable to reduce the amount of computations for multiple-bounce global illumination and apply DXT compression (with subsequent hardware decompression when reading formfactors on GPU). Our implementation is 10 times faster running and requires...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IACR transactions on cryptographic hardware and embedded systems
سال: 2023
ISSN: ['2569-2925']
DOI: https://doi.org/10.46586/tches.v2023.i3.194-220